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ABSTRACT. The Regular Post Embedding Problem extended with partial (co)directness is shown decidable. 
This extends to universal and/or counting versions. It is also shown that combining directness and codirectness 
■ in Post Embedding problems leads to undecidability. 

b! 1 Introduction 

^ | The Regular Post Embedding Problem (PEP for short, named by analogy with Post's Correspon- 
l/-) ■ dence Problem) is the problem of deciding, given two morphisms on words u,v : E* — > T* and a 
regular language R € Reg(Z), whether there is a € R such that «(c) is a (scattered) subword of v(c). 

def 

The subword ordering, also called embedding, is denoted "C": u(g) Q v(c) u(o) can be obtained 
O . by erasing some letters from v(a), possibly all of them, possibly none. Equivalently, PEP is the ques- 
tion whether a rational relation, or a transduction, T C T* x T* intersects non- vacuously the subword 
^ • relation, hence is a special case of the intersection problem for two rational relations. 

This problem is new and quite remarkable: it is decidable (2]| but surprisingly hard since it is not 
primitive-recursive and not even multiply-recursive. In fact, it is at level (and not below) in the 
> ; Fast-Growing Hierarchy I8lfl2l. 

A variant problem, PEP a i r , asks for the existence of direct solutions, i.e., solutions a € R such 
\£> ■ that w(x) E v(x) for every prefix x of a. The two problems are inter-reducible JU, hence have the 
same complexity: decidability of PEP entails decidability of PEP^ir, while hardness of PEP^ir entails 
Q\ ' hardness for PEP. 

o: 

Our contribution. We introduce PEP^ tial , or "PEP with partial directness", a new problem that 
generalizes both PEP and PEPdi r , and prove its decidability. The proof combines two ideas. Firstly, 
k>d , by Higman's Lemma, a long solution must eventually contain "comparable" so-called cutting points, 
5_i ' from which one deduces that the solution is not minimal (or unique, or . . . ). Secondly, the above 
. 5^ ■ notion of "eventually", that comes from Higman's Lemma, can be turned into an effective upper 
bound thanks to a Length Function Theorem. The cutting technique described above was first used 
in (71 for reducing 3°° PEP to PEP. In this paper we use it to obtain a decidability proof for PEP^ 1 ' 131 
that is not only more general but also more direct than the earlier proofs for PEP or PEP^ir- It also 
immediately provides an ^f m a complexity upper bound. We also show the decidability of universal 
and/or counting versions of the extended PEP^ tial problem, and explain how our attempts at further 
generalisation, most notably by considering the combination of directness and codirectness in a same 
instance, lead to undecidability. 



*Supported by ARCUS 2008-1 1 lie de France-Inde and Grant ANR-1 1-BS02-001. The first author was partially funded 
by Tata Consultancy Services. 



2 



Cutting Through Regular Post Embedding Problems 



Applications to channel machines. Beyond the tantalizing decidability questions, our interest in 
PEP and its variants comes from their close connection with fifo channel machines ifTTTl . a family 
of computational models that are a central tool in several areas of program and system verification 
(see and the references therein). Here, PEP and its variants provide abstract versions of verifica- 
tion problems for channel machines |4J, bringing greater clarity and versatility in both decidability 
and undecidability (more generally, hardness) proofs. 

Beyond providing a uniform and simpler proof for the decidability of PEP and PEPdir, our moti- 
vation for considering PEP^ 113 ' is that it allows solving the decidability of UCST, i.e., unidirectional 
channel systems (with one reliable and one lossy channel) extended with the possibility of testing the 
contents of channels ifTOll . We recall that PEP was introduced for UCS, unidirectional channel sys- 
tems where tests on channels are not supported |U[3l, and that PEP^r corresponds to LCS, i.e., lossy 
channel systems, for which verification is decidable using techniques from WSTS theory mEJEl. 

The following figure illustrates the resulting situation. 

decidability via 

UCST ~ PEP^ tial <• (cuttings (this paper) 

decidability j decidability by 

! via blockers g) ' , I WSTS theory £[] [9) 

UCS ~ PEP « two-way reductions J! ^ ^ , cs 

Outline of the paper. Section |2] recalls basic notations and definitions. In particular, it explains 
the Length Function Theorem for Higman's Lemma, and lists basic results where the subword re- 
lation interacts with concatenations and factorization. Section [3] contains our main result, a direct 
decidability proof for PEP^ tml , a problem subsuming both PEP and PEP^r- Section [4] builds on 
this result and shows the decidability of counting problems on PEpP" 1ial . Section [5] further shows the 
decidability of universal variants of these questions. Section [6] contains our undecidability results for 
extensions of PEP^. . A technical appendix provides all the roofs not given in the main text. 

2 Basic notation and definitions 

Words. Concatenation of words is denoted multiplicatively, with £ denoting the empty word. If s 
is a prefix of a word t, s~ Y t denotes the unique word s' such that t = ss', and s~ l t is not defined if s 
is not a prefix of t. Similarly, when s is a suffix of t, ts~ l is t with the s suffix removed. For a word 

def ~ def 

x = ao . . . a„_i, x = a„_i . . . ao is the mirrored word. The mirror of a language R is R = {x \ x G R}. 
We write when s is a subword (subsequence) of t. 

Lemma 1. (Subwords and concatenation, seeApp.© For all words y,z,s,t: 

1. Ifyz E st then y E s or z E t. 

2. If yz E st and z^t and x is the longest suffix ofy such that xzQt, then yx~ 1 C s. 

3. If yz E st and z%t and x is the shortest prefix of z such that x~ l zQt, then yx C s. 

4. If yz E st and zQt and x is the longest prefix of t such that zQx~ l t, then y E sx. 

5. If yz E st and z%t and x is the shortest suffix of s such that z E xt, then y E sx~ l . 

6. Ifsx E yt and t E s, then sx k E y k t for allk> 1 . 

7. If xs E ty and t Qs, then x k s E ty k for allk>\. 
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With a language R one associates a congruence (wrt concatenation) given by s ~g t S Vjc, j(xsy G 
R 43> xty S /?) and called the syntactic congruence (also, the syntactic monoid). This congruence has 
finite index if (and only if) R is regular. For regular R, let hr denote this index: hr < m m when R is 
recognized by a m-state complete DFA. 

Higman's Lemma. It is well-known that for words over a finite alphabet, C is a well-quasi- 
ordering, that is, any infinite sequence of words xi,X2,x$,. . . contains an infinite increasing subse- 
quence xtj C Xi 2 Qxi 3 C • • • . This result is called Higman's Lemma. 

For h S N, we say that a sequence (finite or infinite) of words is «-good if it has an increasing 
subsequence of length n. It is rc-bad otherwise. Higman's Lemma tells us that every infinite sequence 
is «-good for every n. Hence every rc-bad sequence is finite. 

It is often said that Higman's Lemma is "non-effective" since it does not give any explicit in- 
formation on the maximal length of bad sequences. Consequently, when one uses Higman's Lemma 
to prove that an algorithm terminates, no meaningful upper-bound on the algorithm's running time is 
derived from the proof. However, complexity upper-bound can be derived if the complexity of the 
sequences (or more precisely of the process that generates bad sequences) is taken into account. The 
interested reader can consult Ifl2ll for more details. Here we only need the simplest version of these 
results, i.e., the statement that the maximal length of bad sequences is computable. 

A sequence of words x\, . . . ,xi is k-controlled (k € fsf) if |x,- 1 < ik for all i = 1 , . . . , /. 

Length Function Theorem (see App. |A]>. There exists a computable function H : N 3 -> N 
such that any n-bad k-controlled sequences of words in T* has length at most H(n,k, \T\). Further- 
more, H is monotonic in all three arguments. 

Thus, a sequence with more than H[n,k, \T\) words is n-good or is not ^-controlled. We refer 
to Ifl2l for the complexity of H. Here it is enough to know that H is computable. 

3 Deciding PEPjjj 1 ^ 1 , or PEP with partial directness 

We introduce PEP^. , a problem generalizing both PEP and PEP^r, and show its decidability. This 
is proved by showing that if a PEP^ 3 " 13 ' instance has a solution, then it has a solution whose length is 
bounded by a computable function of the input. This proof is simpler and more direct than the proof 
(for PEP only) based on blockers Q. 

DEFINITION 2. PEP^ 13 ' is the problem of deciding, given morphisms u,v : E* — > T* and regular 
languages R,R' £ Reg(E), whether there is a € R such that u{o) C v(c) and u{x) C v(x) for all 
prefixes x of a belonging to R' (in which case o is called a solution ). 

PEP^ 1 ^ 1 is the variant problem of deciding whether there is a € R such that u(c) C v(c) and w(x) C 
v(x) for all suffixes ofiofo that belong to R'. 

Both PEP and PEP dir are special cases of PEP^ 1 , obtained by taking R' = and R' = E* 
respectively. Obviously PEP^ 113 ' and PEP^" 1 are two equivalent presentations, modulo mirroring, 

of a same problem. Given a PEP^ 131 (or PEP^j r al ) instance, we let K u = max ae £ \u(a)\ denote the 
expansion factor of u and say that a G E* is long if \o\ > 2H{nRnR> + \ ,K U , \ T\), otherwise it is short 
(recall that H(n,k, \Y\) was defined with the Length Function Theorem). In this section we prove: 



4 



Cutting Through Regular Post Embedding Problems 



THEOREM 3. A PEP^ ir or PEPj; odir instance has a solution if, and only if, it has a short solution. 
This entails that PEP^' 12 ' and PEPj?^ 3 ' are decidable. 

Decidability is an obvious consequence since the maximal length for short solutions is com- 
putable, and since it is easy to check whether a candidate a is a solution. Furthermore, one derives 
an upper bound on the complexity of PEP^ 131 since the Length Function H is bounded in ^F a <s> Ifl2l . 

For the proof of Theorem |3l we find it easier to reason on the codirect version. Pick an arbitrary 
PEP cod!r al instance (Z,T,u,v,R,R') and a solution a. Write N = |c| for its length, c[0, i) and c[i,N) 
for, respectively, its prefix of length i and its suffix of length N — i. Two indices i,j € [0,N] are 
congruent if o[i,N) o[j,N) and o[i,N) ~r' o[j,N). When a is fixed, as in the rest of this section, 
we use shorthand notations like uoj and v ; -j to denote the images, here m(g[0,/)) and v(o[i,j)), of 
factors of a. 

We prove two "cutting lemmas" giving sufficient conditions for "cutting" a solution a = a[0, N) 
along certain indices a < b, yielding a shorter solution o' = o[0,a)a[b,N). Here the following no- 
tation is useful. We associate, with every suffix x of a', a corresponding suffix, denoted S(x), of a: 

if X is a suffix of a[b,N), then 5(x) = X, otherwise, x = o[i,a)a[b,N) for some i < a and we let 

def 

5(x) = a[i,N). In particular S(c') = a. 

An index i £ [0,N] is said to be blue if u^n E v;,/v, it is red otherwise. In particular, N is blue 
trivially, is blue since a is a solution, and i is blue whenever o[i,N) £ R'. If / is a blue index, let 
k € T* be the longest suffix of uqj such that hin^ C vyy and call it the left margin at i. 

LEMMA 4. (Cutting lemma for blue indices) Let a < b be two congruent and blue indices. If 
la E h, then a' = o[0,a)o[b,N) is a solution (shorter than o). 

Proof. Clearly a' € R since a e R and a and b are congruent. Also, for all suffixes x of a', 
S(x) £R' iff xeR'. 

We claim that, for any suffix x of a', if «(S(x)) C v(5(x)) then m(x) C v(x). This is obvious 
when x = 5(x), so we assume x ^ S(x), i.e., x = o[i,a)o[b,N) and S(x) = o[i,N) for some i < a. 
Assume u(S(x)) C v(S(x)), i.e., Ujjy C v,-^. Now at least one of tt i>a and l a is a suffix of the other, 
which gives two cases. If is a suffix of l a , then 

«(x) = u ifi Ub,N E ^"fo.w E E v&,/v E v(t) . 
Otherwise, w, ja = xl a for some x (see Fig.Q}. Then E rewrites as Ui fi u a ^ = xl a u c , M C Vi, a v a jy. 



v 0,i Vj j0 v n _4 vjjv 
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Figure 1: Schematics for Lemma|4j with l a C 

Now, and since l a is the longest suffix for which l a u a ^ E v fl ,/v, Lemma[T]2 entails x C v ; - a . Combining 
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with /„ E lb (assumption of the Lemma) gives: 

w(x) = Ui, a Ub,N = xl a Ub,N E Vi,JbUb,N E Vi, a Vb,N = v(l) . 

This shows that a' is a solution (which completes the proof) since we can infer m(x) E v(t) for any 
suffix x G R' (or for x = a') from the corresponding w(S(x)) C v(5(x)). 

If z is a red index, let r,- € T* be the shortest prefix of wyy such that r^ l u^N E Vyv (equivalently 
«/,iV E riVifl) an d call it the n'g/i? margin at z. 

LEMMA 5. (Cutting lemma for red indices) Let a <b be two congruent and red indices. If rbQr u , 
then o' = o[0,a)o[b,N) is a solution (shorter than o). 

PROOF. Write under the form rbx so that x E Vb,N- We proceed as for Lemma|4]and show that 
"(S(x)) E v(5(x)) implies u(x) C v(x) for all suffixes x of o'. Assume m(S(t)) C v(5(t)) for some 
x. The only interesting case is when x 7^ 5(x) and x = a[i,a)c[b,N) for some i < a (see Fig.[2]>. 
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Figure 2: Schematics for Lemma[5j with C r a 



From K/ jiV = u Ua ii a ,N E v^v^ = v,-^, i.e., u{S(x)) E v(5(x)), and u aJV % v a ^ N (since a is a red 
index), the definition of r a entails U{ A r a E V{, a (Lemma[T]3). Then 

w(x) = K,>M6,iv = Ui^nx E Ujj,r a Vb,N E v^Vfo^ = v(x) . 



We now conclude the proof of Theorem |3] Let gi < g2 < • ■ ■ < gN, be the blue indices in a, let 
b\ <bi < •■• < bjq 2 be the red indices, and look at the corresponding sequences il gi )i=\,...ji x of left 
margins and (^) ;= i of right margins. 

LEMMA 6. (See Add. BO) \l gi \ < (/- 1) x K u foralli = l,...,N u and \r b ,\ < (N 2 -i + l) X K u 
for alii = l,...,N2- In other words, the sequence on left margins and the reversed sequence of right 
margins are K u -controlled. 

def def 

Now, let N c = n R n R i + 1 and L = H(N C ,K U , \T\) and assume ./V > 2L. Since Ni +N 2 = N + I, 
either o has at least L + 1 blue indices and, by definition of L and H, there exist N c blue indices 
a\ < t?2 < • • • < un c with l cl] E ^ 2 E • • ■ E hi Nc , or a has at least L + 1 red indices and there exist 
_/V c red indices a[ < a! 2 < ■ ■ • < with r a > N E • • • E r a ^ C r a / (since it is the reversed sequence of 
right margins that is controlled). Out of N c = 1 + n R n R i indices, two must be congruent, fulfilling the 
assumptions of Lemma|4]or Lemma[5] Therefore o is not the shortest solution, proving Theorem[3] 
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4 Counting the number of solutions 

We consider two counting questions: 3°°pEpPj^ tial j s ^e question whether a PEP^ tial instance has 
infinitely many solutions (a decision problem), while #PEP P j artial is the problem of computing the 
number of solutions of the instance (a number in N U {°°}). For technical convenience, we often deal 
with the (equivalent) codirected versions, B^PEP^^. and #PEPj?^; r al . 

def 

For an instance (L,T,u,v,R,R ! ), we letK v = max ae s \v(a) | and define 

L = H{n R n R , + \,K V , \T\) , U = h([u r (2L+ \)]" r{2L+1) n R , + l,K u , \T 



We say that a solution a € E* is long if |a| > 2L and very long if |a| > 2L' (note that "long" is slightly 
different from "not short" from Section[3]). In this section we prove: 

THEOREM 7. For a PEP^ tial or PEP^j r al instance, the following are equivalent: 

(a) . It has infinitely many solutions. 

(b) . It has a long solution. 

(c) . It has a solution that is long but not very long. 

From this, it will be easy to count the number of solutions: 

COROLLARY 8. 3~PEpP artial and 3°°PEP^ r al are decidable, #PEP p ™ tial and #PEP^ al are com- 
putable. 

PROOF. Decidability for the decision problems is clear since L and L' are computable. 

For actually counting the solutions, we check whether the number of solutions is finite or not 
using the decision problems. If infinite, we are done. If finite, we first compute an upper bound on 
the length of the longest solution. For this we build PEP^ 112 ' (resp. PEPj^j^) instances where R is 
replaced by R \ E- M (which is regular when R is) for increasing values of M € IN. When eventually 
M is large enough, the instance is negative and this can be detected (by Theorem [3]). Once we know 
that there are no solutions longer than M, counting solutions is done by finite enumeration. 

We now prove Theorem |7J First observe that if the instance has a long solution, it has a solution 
with R replaced by 7?nE >2i . This language has a DFA with n R (2L-\- 1) states, thus the associated 
congruence has index at most (n R (2L+ 1))"«( 2L+1 ). From Theorem [3j the instance has a solution 
which is long but not very long. Hence (b) and (c) are equivalent. 

It remains to show (b) implies (a) since obviously (a) implies (b). For this we fix an arbitrary 
PEP^- al instance (L,T,u,v, R,R') and consider a solution a, of length N. We develop two so-called 
"iteration lemmas" that are similar to the cutting lemmas from Section[3j with the difference that they 
expand a instead of reducing it. 

As before, an index i G [0,N] is said to be blue if wyy C v,-^, and red otherwise. With blue and 
red indices we associate words analogous to the Z,'s and r,'s from Section [3j however now they are 
factors of v(c), not u(o) (hence the different definition for L). The terms "left margin" and "right 
margin" will be reused here for these factors. 

We start with blue indices. For a blue index i € [0,N], let si be the longest prefix of vyy such that 
Ui,N E sf vyy (equivalently SjUj^ C v,yv) and call it the right margin at i. 
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LEMMA 9. Suppose a <b are two blue indices with Sb Q s a . Then for all k>\, s a (u a ^) k C (v a ,b) k Sb- 

PROOF. s a u a y □ v a ,N expands as (s a Ua,b)ubjv E Va,&v&,jv- Since wj,^ C vjjy, Lemma [T]4 yields 
Sa u a,b E v a ,b s b- One concludes with Lemma[Tj6, using 



Lemma 10. (Iteration lemma for blue indices, see App. IC.ll ) Let a < b be two congruent and 
blue indices, tf Sb E then for every k > 1, a' = <j[0,a).<j[a,&)*.c[&,iV) is a solution. 

Now to red indices. For a red index i £ [0,./V], let be the shortest suffix of vo,; such that 
«iW E t{Vifi. This is called the left margin at /. Thus, for a blue j such that j < i, Uj^ C v y - ^ implies 
E V/,; by Lemma[T]5. 

LEMMA 11. (Iteration lemma for red indices, see App. IC.2D Let a <b be two congruent and red 
indices. Ift a C ^, then for every & > 1, a' = o[0,a).o[a,b) k .o[b,N) is a solution. 

We now conclude the proof of Theorem[7] We first prove that the PEPj^J™ 1 instance has infinitely 
many solutions iff it has a long solution. Obviously, only the right-to-left implication has to be proven. 

Suppose there are Ni blue indices in a, say g\ < g% < ■ ■■ < g^; and N2 red indices, say b\ < 
b 2 < - -< b Nl . 

LEMMA 12. (See App. IC.3D \s Si \ < (Ni — i+ 1) x K v for all i=\,...,N\, and \t bi \ < (i- 1) X K v 
for alii = I,... ,N2- That is, the reversed sequence of right margins and the sequence of left margins 
are K y -controlled. 

Assume that a is a long solution of length N >2L + \. At least L + 1 indices among [0,N] are 
blue, or at least L + 1 are red. We apply one of the two above claims, and from either s gNi ,...,s gl 
(if Ni > L + 1) or t^ , . • . , tb N (if N2 > L + 1) we get an increasing subsequence of length urHr' + 
1. Among these there must be two congruent indices. Then we get infinitely many solutions by 
Lemma [TOl or Lemma [TT] 



5 Universal variants of PEpP a r rtial 

We consider universal variants of PEP^ 112 ' (or rather PEP^^ 1 for the sake of uniformity). Formally, 
given instances (L,T,u,v,R,R') as usual, VPEP^^ is the question whether every a € R is a solu- 
tion, i.e., satisfies both u(a) C v(g) and w(x) C v(x) for all suffixes x that belong to R'. Similarly, 
y°op^ppartiai - s q Uest j on whether "almost all", i.e., all but finitely many, a in R are solutions, and 
#-iPEP^j r al is the associated counting problem that asks how many a € R are not solutions. 

The special cases VPEP and V°°PEP (where R' = 0) have been shown decidable in [7] where 
it appears that, at least for Post Embedding, universal questions are simpler than existential ones. 
We now observe that VPEP[|" tia and V^PEP^ 11 are easy to solve too: partial codirectness con- 
straints can be eliminated since universal quantifications commute with conjunctions (and since the 
codirectness constraint is universal itself). 

LEMMA 13. VPEP^ 31 and V^PEP^ 1 many-one reduce to V°°PEP. 
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Corollary 14. VPEP^ 1 and V-PEP^ 1 are decidable, #-PEP^ al is computable. 

We now prove Lemma [T3l First, VPEP^^ easily reduces to V^PEP^^ 3 ': add an extra letter z 
to E with u(z) = v(z) = £ and replace R and R' with R.z* and R'.z*. Hence the second half of the 
lemma entails its first half by transitivity of reductions. 

For reducing V^PEPj^- 31 , it is easier to start with the negation of our question: 

3°°o G R : (u(o) % v(g) or o has a suffix x in R f with u(x) g v(x)) . (*) 

Call o£Sa type 1 witness if u(o) % v(a), and a type 2 witness if it has a suffix x G R' with u(%) % 
v(x). Statement (0) holds if, and only if, there are infinitely many type 1 witnesses or infinitely many 
type 2 witnesses. The existence of infinitely many type 1 witnesses (call that "case 1") is the negation 
of a V°°PEP question. Now suppose that there are infinitely many type 2 witnesses, say 0\,02, ■ ■ ■ For 
each i, pick a suffix X; of a, such that x, G R' and w(x,-) [2 v(x,). The set {x, | i = 1,2, . . .} of these 
suffixes can be finite or infinite. If it is infinite ("case 2a"), then 

w(x) [2 v(x) for infinitely many x G OR 1 ) , (**) 

where ~R* is short for -°R and for i£N, - k R = {y | 3x : ( |x| > k and xy G /?) } is the set of the suffixes 
of words from R one obtains by removing at least k letters. Observe that, conversely, ((**]) implies 
the existence of infinitely many type 2 witnesses (for a proof, pick Xi OR' satisfying the above, 
choose Gi G R of which Xi is a suffix. Then choose X2 such that |X2 1 > |<Ji|, and proceed similarly). 

On the other hand, if {x ; - | i = 1,2,...} is finite ("case 2b"), then there is a x G R' such that 
u(t) ^ v(t) and a'x G R for infinitely many a'. By a standard pumping argument, the second point 
is equivalent to the existence of some such a' with also \o'\ > k^, where kR is the size of a NFA for 

R (taking k^ = n^ also works). Write now R for >kR R: if {x,- | i = 1,2, . . .} is finite, then u(%) % v(x) 
for some X in (R 1 HR), and conversely this implies the existence of infinitely many type 2 witnesses. 

To summarize, and since and R are regular and effectively computable from R, we have just 
reduced V°°PEP^j r al to the following conjunction 

V°°G G /? : «(a) Cv(a) /\V°°xg (R*r\R') : w(x) C v(x)A Vx G (RnR') :w(x) Cv(x). 

V v ' * v ' * v ' 

not case 1 not case 2a not case 2b 

This is now reduced to a single V°°PEP instance by rewriting the VPEP into a V°°PEP (as said in the 
beginning of this proof) and relying on distributivity: 

n n 

A [V°°.x G Xi : . . . some property . . . ] = V°°^ G MX; : ... same .... 

6 Undecidability for PEP co &dir and other extensions 

The decidability of PEpPf 13 ' is a non-trivial generalization of previous results for PEP. It is a natural 
question whether one can further generalize the idea of partial directness and maintain decidability. 
In this section we describe two attempts that lead to undecidability, even though they remain inside 
the regular PEP framework0 



' PEP is undecidable if we allow constraint sets R outside Reg(Z) J2j. Other extensions, like 3x e R\ : Vy £ R2 
u{xy) C v(xy), for R\,R2 6 Reg(E), have been shown undecidable [6|. 
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Allowing non-regular R'. One direction for extending PEP^" 13 ' is to allow more expressive R' 
sets for partial (co)directness. Let pEpP^^ 1 ! 001 ^] an( j PEP^ 1 ^ 1 ^ 6 ^ be like PEPj^j 1 ™ 1 except that 
R' can be any deterministic context-free R' G DCFL(E) (respectively, any Presburger-definable R' G 
Pres(E), i.e., a language whose Parikh image is a Presburger, or semilinear, subset of N' E '). Note 
that R G Reg(E) is still required. 

Theorem 15. (Undecidability) PEP[;i al[DCFL] and PEpP;2 al[Presl are 1^ -complete. 

Since both problems clearly are in Ej, one only has to prove hardness by reduction, e.g., from 
PCP, Post's Correspondence Problem. Let (E, T, u,v) be a PCP instance (where the question is 

whether there exists x G E + such that u(x) = v(x)). Extend E and Y with new symbols: E' = f 

rief 

E U {1,2} and F = T U {#}. Now define u',V : E'* r by extending u,v on the new symbols 
with u'(l) = v'(2) = £ and u'{2) = v'(l) = #. Define now R = 12E+ and R' = {x2x' | x,x' G 
E* and |m(xx') | ^ |v(xx') |}. Note that 7?' is deterministic context-free and Presburger-definable. 

Lemma 16. (SeeAppM The PCP instance (E,r,w,v) has a solution if and only if the PEPj^jf^ 
and PEP^ r al ^ DCFL ^ instance (T,',T',u',V,R,R f ) has a solution. 



Combining directness and codirectness. Another direction is to allow combining directness and 
codirectness constraints. Formally, PEP C0 &dir is the problem of deciding, given E, T, u, v, and 
R G Reg(E) as usual, whether there exists o G R such that m(x) C v(x) and m(x') C v(x') for all 
decompositions a = x.x'. In other words, a is both a direct and a codirect solution. 

Note that PEPco&dir has no R' parameter (or, equivalently, has R' = E*) and requires directness 
and codirectness at all positions. However, this restricted combination is already undecidable: 

THEOREM 17. (Undecidability) PEP co&dir is E^ -complete. 

Membership in Ej is clear and we prove hardness by reducing from the Reachability Problem for 
length-preserving semi-Thue systems. The undecidability is linked to relying on different embeddings 
of u(c) in v(a) for the directness and codirectness. In contrast, for PEPj]^ 1 we need to consider 
only the leftmost embedding of u(o) in v(a). 

A semi-Thue system S = ( Y, A) has a finite set A C Y* x Y* of string rewrite rules over some 
alphabet Y, written A = {l\ — >• r\,...,lk — > r{\. The one-step rewrite relation — >a G Y* x Y* is 

def 

defined as usual with x— Y^y ^ x = zh' and y = zrz' for some rule / — >■ r in A and strings z,Z in 
Y*. We write x^-^y and x-^-&y when x can be rewritten into y by a sequence of m (respectively, any 
number, possibly zero) rewrite steps. 

The Reachability Problem for semi-Thue systems is "Given S = (Y,A) and two regular lan- 
guages P\,P2 G Reg(Y), is there x G Pi and y G P2 s.t. xAaj?". It is well-known (or easy to see 
by encoding Turing machines in semi-Thue systems) that this problem is undecidable (in fact, Ej- 
complete) even when restricted to length-preserving systems, i.e., systems where |/| = \r\ for all rules 
I -» r G A. 

We now construct a many-one reduction to PEP co &dir- Let S = (Y,A), P\, P2 be a length- 
preserving instance of the Reachability Problem. W.l.o.g., we assume Z^P\ and we restrict to reach- 
ability via an even and non-zero number of rewrite steps. With any such instance we associate a 
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PEP C0 &dh- instance u,v : E* — » T* with P € Reg(E) such that the following coiTectness property 
holds: 

3x G Pi, By € P2, 3m s.t. x-^aV (and m > is even) 

(CP) 

iff 3o € R s.t. u(x) C v(x) and w(x') C v(t') for all decompositions a = xx'. 

The reduction uses letters like a, b and c taken from Y, and adds f as an extra letter. We use six 
copies of each such "plain" letter. These copies are obtained by priming and double-priming letters, 
and by overlining. Hence the six copies of a are a, a', a", a, a', a". As expected, for a "plain" word (or 
alphabet) x, we write x' and x to denote a version of x obtained by priming (respectively, overlining) all 
its letters. Formally, letting Y t being short for Y U { f } , one has E = I = Y t U Y' t U Y" U Y7 U U Y^ 7 . 
We define and explain the reduction by running it on the following example: 

Y = {a,b, c} and A = {ab — > be, cc — > aa}. (5 exm p) 

Assume that abc € Pi and baa G P2. Then Pi —t^Pz since abcA^baa as witnessed by the following 
(even-length) derivation % = "abc— >&bcc— > A baa". In our reduction, a rewrite step like "abc— >^bcc" 
appears in the PEP solution a as the letter-by-letter interleaving abbece, denoted abc LU bec, of a 
plain string and an overlined copy of a same-length string. 

Write T*. ( A) , or just T*. for short, for the set of all x LU y such that x-^^y. Obviously, and since 
we are dealing with length-preserving systems, Tp. is a regular language, as seen by writing it as 
7V = (LaeY a ^) LU r I l^re A}.(£ ae y«a) > where {/LUr \ I — >■ r 6 A} is a finite, hence regular, 
language. 

7V accounts for odd-numbered steps. For even-numbered steps like bec— s^baa in 71 above, we 
use the symmetric bbacac, i.e., baaLUbcc. Here too = {ylLix \ x— >&y} is regular. Finally, a 

def 

derivation % of the general form xq— >a*i — >a*2 • • • — >A x 2k, where K = \xq\ = ... = \X2k\, is encoded 
as a solution a % of the form o K = poC7iPiG2 . . . P2k-i^2k92k that alternates between the encodings of 
steps (the a ( 's) in U T+, and fillers, (the p,'s) defined as follows: 

def J Xi-iUJXi for odd/, p =Xq ujf' K , rt def/ f /A -LU^- for odd/, 

4> 



(J- = S 0' = s 

1 Xiinxi-i for even/, p 2k = x " k LU f' K , I ^Lllf* for even / 7^ 0, 2& . 



Note that the extremal fillers po and p2k use double-primed letters, when the internal fillers use primed 
letters. Continuing our example, the o n associated with the derivation abc— s^bec— >^baa is 

Cn = a"fb"fc"f abbece fVf^fd bbacac bfafaf. 



a"b"c"LUt"t"t" abcmbcc ft't'LUb'c'c' BaaLUbcc b »a"a"LIJt"t"t" 

The point with primed and double-primed copies is that u and v associate them with different images. 
Precisely, we define 

u(a) = a, u(a') = f, M (t') = t, u(a")=Z, w(f ") = 8, 

v(a) = f , v(a')=a, v(f') = wy, v(a')=a, v(f ") = wy, 



where a is any letter in Y, and where wy is a word listing all letters in Y. E.g., W{ ab c j = abc in our 
running example. The extremal fillers use special double-primed letters because we want «(po) = 



P. Karandikar and Ph. Schnoebelen 



u(p2k) = £ (while v behaves the same on primed and double-primed letters). Finally, overlining is 
preserved by u and v: u(x) = u(x) and v(x) = v(x). 

This ensures that, for i > 0, m(g,-) Q v(p ( _i) and «(p,) E v(<J,), so that a o n constructed as above 
is a direct solution. It also ensures w(a,) C v(p,) and w(p,_i) C v(a,) for all / > 0, so that o n is also 
a codirect solution. One can check it on our running example by writing u(c K ) and v(c n ) alongside: 

Po cji Pi a 2 Pz 

Gn = a'fb'fc'f aT^S YVfdYd bto^ b"f a'YV't 77 



= abbccc tttttt bbacac 

v(o K ) = aabcbabccabc tttttt abcbabccabcc tttttt babcaabcaabc 

There remains to define P. Since po G (Y"t") , since Oi € T*. for odd i, etc., we let 

R d 4 f (Y"f) + .r^ Pl . (t'Y 7 ) + . (?v (Y'f) + (t'Y 7 ) + ) *.r^ P2 . (Vf 7 ) + , (1) 

where T^ Pi = f {.xLUy | x— Ax 6 Pi} = T^.r\{xmy \ xEP\ A |jc| = \y\} is clearly regular when Pi is, 

and similarly for T+ Pl - = {y\±lx\ x-^ A yAy G P2}. S ince (J71 G P when % is an even-length derivation 
from Pi to P2, we deduce that the left-to-right implication in (1CPI) holds. 

We refer to Appendix|E] for a proof that the right-to-left implication also holds, which concludes 
the proof of Theorem IT71 



7 Concluding remarks 

We introduced partial directness in Post Embedding Problems and proved the decidability of PEP^ tial 
by showing that an instance has a solution if, and only if, it has a solution of length bounded by a 
computable function of the input. This generalizes and simplifies earlier proofs for PEP and PEP^ir- 
The added generality is non-trivial and leads to decidability for UCST, or UCS (that is, unidirectional 
channel systems) extended with tests iflQl . The simplification lets us deal smoothly with counting 
or universal versions of the problem. Finally, we showed that combining directness and codirectness 
constraints leads to undecidability. 
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Technical Appendix — not for the proceedings version 
A Proof of the Length Function Theorem 

PROOF. The set of all ^-controlled rc-bad sequences ordered with the prefix ordering is a tree, 
with the empty sequence as root. (Note that any prefix of a ^-controlled rc-bad sequence is controlled 
and bad itself, so that our sequences correspond to paths from the root of the tree). The tree has no 
infinite branches, otherwise we would read an infinite bad sequence along it, contradicting Higman's 
Lemma. Furthermore the tree is finitely branching, since the sequences are ^-controlled and T is fixed. 
By Konig's Lemma, the tree is then finite. We let H(n,k, \T\) be the length of its longest branch. H 
is computable since the tree can be constructed effectively from its root by listing the finitely many 
ways a current n-bad sequence can be extended in a ^-controlled way. 

The above H is clearly monotonic, but anyone who doubts it can rather define 

H'{n,k,s) = max{H(n',k'J) | <ri <n A <k! <k A < s' < s} . 

Finally, more elaborate notions of controlled sequences can be accommodated, as witnessed in lfl2l . 
as long as the tree of controlled bad sequences is finitely branching. 

B Missing proofs from Section [3] 

B.l Proof of Lemma [Tj 

Items 1 to 5 are easy (or see Q Section 3]). Item 6 is proved by induction on k. The claim is true 
for k = 1, suppose it is true for k = p. Then sx p+i = sx p x C y p tx C y p sx C y p yt = y p+1 t. Item 7 is 
obtained from item 6 by mirroring. 

B.2 Proof of Lemma HI the sequence of left margins and the reversed sequence of 
right margins are ^-controlled 

We prove that \ < (/— 1) X K u by induction, showing \l gl | = and \l gi \ — \l gj _ 1 \ < K u for i > 1. 

The base case i = 1 is easy: obviously gi =0 since is a blue index, and 1$ = £ since it is the 
only suffix of mo,o = e > so tnat \lg\ I = 0- 

For the inductive step i > 1, we have two cases depending on whether 1 + is blue or red. If 
1 +gi-i is blue, then gj = 1 and l gj is a suffix of l gi l u(c(gi)), so that \l gj \ < \l gj _ l \ +K U which 

proves the claim. 

If 1 + gi-i is red, then all positions from 1 to gi — 1 are red too, and l gj is a suffix of 

u(o(gi)), so that \l gj \ < K u which proves the claim. 

The reasoning for |r^.| is similar: 

If bj + \ = 1 +b{, then both b{ and the next index are red. Then is a prefix of u(o(bj))ri, j+l so 
that \r bi \ <K u +\r bi+1 \. 

If £>;-)_ i > 1 +b{, then bj + 1 is blue and r bi is a prefix of u(o(bi)) so that \r bi \ < K u . In paiticular, 
since N = |a| is blue, b^ 2 < N and \r bN \ <K U . 
Finally, \r bi \ < (N 2 + 1 - i) x K u . * 
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C Missing proofs from Section [4] 

C.l Proof of Lemma US Iteration lemma for blue indices 

Proof. Let x be any suffix of a'. We show that u{x) C v(x) when x G R' or x = a', which will 
complete the proof. There are three cases, depending on how long x is. 

• x is a suffix of a[a,N). Then x is a suffix of G itself, and this case is trivial since a is a solution. 

• x is a[i,b)o[a,b) p o[b,N) for some p > 1 and a < i < b. Since a and b are congruent, x € R' 
implies a[i,N) € R'. Thus E vyv, hence My, E v^Sb (since E v^). Then, using 
Sb E Lemma|9j and tyiti,^ E v^, we get 

u{x) = u ub {u cub ) p u b , N E v uh s b (u a , b ) p u b y E VjJ,S a (u aib ) P U b<N 
E v ub (v cub ) p s b u b y E v I -,/,(v flj& ) p v/,, /v = v(x) . 

• x is a[/,fl , )a[a,ft)* : a[ft,A r ) for some < i < a. Since a and ft are congruent, x € /?' (or x = a) 
implies w,-^ G /?' (or ttyv = <J) so that Myy E v,^, from which we deduce u^ a E v, >a J a as in the 
previous case. Then, using Lemma [9] and s b u b ,N E Vbjf, we get 

"CO = Ha(u a ,b) k U b ,N E V Ua S a (u a ^) k U btN E Vi ta (v a ^) k ShU btN 

E v ! ' !a (v a> fc)* : vfc ! iv = v(x) . 
C.2 Proof of Lemma Hit Iteration lemma for red indices 

Let x be any suffix of a'. We show that u(x) E v(x) when x G R' or x = a', which will complete the 
proof. There are three cases, depending on how long x is. 

• x is a suffix of a[a,N). Then x is a suffix of G itself, and this case is trivial since a is a solution. 

• x is c[i,b)o[a,b) p o[b,N) for some p > 1 and a < i < b. Since a and b sue congruent, x G R' 
implies a[i,N) G R'. Thus m,-^ E v,yy, hence E Vi,b since ft is red. Using Lemma [T]5 
agaird, we get u a ^t b E t a v a ,b, and then {u a j,) p t b E t a (v a j 7 ) p with Lemma[Tj7. Then 

"CO = Ui,b( u a,b) P Ub,N E H b (u a j,) P t b V b:N 

E U ib t a (v a: b) P V b ,N E U ib t b (v a , b ) P V b ,N E V^(v fl ^) P V fcjA r = v(x) . 

• x is a[/,a)a[a,Z?) <: a[Zj,A^) for some < / < a and k > 1. Since a and ft are congruent, x G (or 
x = a) implies w,y\r G /?' (or tt,yv = a) so that ityy E vyv, from which we deduce u^ a t a E v !ifl as 
in the previous case. Then 

"(0 = Ui,a{Ua,b) k U b JV E "i,a {u a j y ) k t b V b , N C U Ua t a (v aJ} ) k V bJ v C V^V^)*^ = v(x) . I 

C.3 Proof of Lemma Efc the reversed sequence of right margins and the sequence of 
left margins are -controlled 

We start with blue indices and right margins. 



with u = u a b , v = Ubjf, s = t a v ai> , t = v b pi 
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LEMMA 18. Suppose a <b are two blue indices. Then s a is a prefix ofv a bSb- 

PROOF. Both s a and v a ^Sb are prefixes of v a jy, hence one of them is a prefix of the other. Assume, 
by way of contradiction, that v a ^Sb is a proper prefix of s a , say s a = v CL bSb% for some x ^ £. Then 
s a u a ,N E v u ,n rewrites as v a ^sbxu a ^ C v a ,bVb,N- Cancelling v a ,b on both sides gives Sbxu a ^ C Vbjj, 
i.e., (sbXu a ,b)ub,N E v^a?, which contradicts the definition of ^. 

We now show that s gN ,...,s gl is ^-controlled. AMs a blue index, and \sn\ = 0. For i G [O.Af), 
if both / and i + 1 are blue indices, then by Lemma [T8l \sj \ < \si+i | + K v . If i is blue and i + 1 is red, 
then it is easy to see that Sj is a prefix of v(c,-), and hence < K v . So we get that s gN ,...,s gl is 
^T v -controlled. 

Now to red indices and left margins. is not a red index. For i G [0,./V), if both i and i + 1 are 
red, then it is easy to see that tt+\ is a suffix of f;v(a,), and so < \tj\ + K V . If i is blue and /+ 1 
is red, then is a suffix of v(a,), and so < K v . So we get that f^,...,?^ is ^.-controlled. 

D Proof of Lemma [TBI 

def 

Suppose a is a solution to the PCP problem. Then o/e and u(a) = v(c). Now a' = 12a is a 
solution to the partially codirected problem since 12gg/?, «'(12a) = #m(<j) Ev'(12c) =#v(a),and 
a' has no suffix in R' (indeed 2a $ R' since \u(c)\ = |v(a)|). 

Conversely, suppose a' is a solution to the partially codirected problem. Then a' = 12a for 
some o^e. Since u'(o') = #«(a) E v'(a') = #v(a), we have m(g) C v(a). If |m(o)| 7^ |v(o) |, then 
2a G R', and so we must have u'(2o) = #u(a) E v'(2a) = v(a). This is not possible as # does not 
occur in v(c). So |«(c) | = |v(a)|, and u(a) = v(a). Finally, a is a solution to the PCP problem. 

E Undecidability of PEP C0 &dir 

In this section we prove the right-to-left half of (ICPb that states the correspondence of the reduction 
defined in the proof of Theorem [171 

Assume that there is a G R such that u(x) E v(x) and m(t') E v(x') for all decompositions 
a = xx'. By definition of R (see Eq. £[]) on pagefTT]), a G R must be of the form 

P0CJ1P1 (a2P2C?3P3) • ■ • (• • •02jfe-lp2fc-l)02/tp2* 

for some > 0, with p G (Y"f ") + , with a, G 7V for odd i and a, G for even /, etc. These 4k + 1 
non-empty factors, (0;)i<i<2* and (pi)o<;<2/t, are called the "segments" of a, and numbered so, . . ., 54* 
in order. 

LEMMA 19. «(5 , / ,) Ev(s p _i) and u(s p -i) Qv(sp) for all p = I, ...,4k. 

PROOF. First note that the definition of u and v ensures that u(s p ) and v(j p ) use disjoint alphabets. 
More precisely, all m(<j,-)'s and v(p,)'s are in (YY)*, while the v(C;)'s and the «(p,-)'s are in (ft)*, 
with the special case that w(po) = «(p2i) = £ since po and p2jt are made of double-primed letters. 
Since a is a direct solution, u(sq . . . s p ) E v(sq • • • s p ) for any p, and even 



U(SQ...S P ) C v(SQ...Sp-l), 
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since v(s p ) has no letter in common with u(s p ). We now claim that, for all p = 1, ... ,4k 

u(s Q s l ...s p ) £v(soSi...Sp-2), (B p ) 

as we prove by induction on p. For the base case, p = I, the claim is just the obvious u(sqSi ) % £. For 
the inductive case p > 1, one combines u(sq . . .s p -i) % v(sq . . -s p -s) (ind. hyp.) with u(s p ) [2 v(s p -2) 
(different alphabets) and gets u(so . . . s p ) % v(so . . . s p -2)- 

Combining (A p ) and i.e., u(s . . . s p ) C v(j • • • fy-i) and u(s si . . . s p -i) % v(s s\ . . . s p - 3 ), 

now yields u(s p ) C v(s p ^p-i), hence u(s p ) C v(s p -i) since and v(s p -2) share no letter: we 
have proved one half of the Lemma. The other half is proved symmetrically, using the fact that a is 
also a codirect solution. 



Lemma 20. = \s2\ = ■ ■■ = \s4k-1 \- 

Proof (Idea). Since u(s p ) C v{s p -\ ), the special form of the segments (from the definition of R) 
and the definition of u and v yield \s p \ < \s p -i\, hence |so| > \si \ > • ■ • > |*4i-i |- With the other half 
of Lemma[T9l i.e., u(s P -\) v(s p ), one gets < fa] < •< |^4/t|- 

Now assume / G {1, . . . ,2k} is odd. By definition of R, a, € 7^ is some x,-_i Lily; with >a.V; 
and c,-+i € 7^ is some y, + i LUx; with x,— ^aJh-i- Furthermore, p, is some 

f'WLu^. With Lemma [J9j 

we deduce y,- C z ; - and x, C z,. With Lemma l20l we further deduce |y,-| = \zt\ = |x;|, hence y, = x ; -. 
A similar reasoning shows that y, = x, also holds when / is even, so that the steps Xj_i— ^Aji can 

be chained. Finally, we deduce from a the existence of a derivation xo— s>a*i— >A*2k- Since 

Go £ and £ P^ 2 , we further deduce xo G Pi and X2* £ p2- Hence the existence of a entails 
Pi— >&P2, which concludes the proof. 



